AITopics | semantic-guided multi-attention localization

Semantic-Guided Multi-Attention Localization for Zero-Shot Learning

Neural Information Processing SystemsDec-25-2025, 02:01:09 GMT

Zero-shot learning extends the conventional object classification to the unseen class recognition by introducing semantic representations of classes. Existing approaches predominantly focus on learning the proper mapping function for visual-semantic embedding, while neglecting the effect of learning discriminative visual features. In this paper, we study the significance of the discriminative region localization. We propose a semantic-guided multi-attention localization model, which automatically discovers the most discriminative parts of objects for zero-shot learning without any human annotations. Our model jointly learns cooperative global and local features from the whole object as well as the detected parts to categorize objects based on semantic descriptions. Moreover, with the joint supervision of embedding softmax loss and class-center triplet loss, the model is encouraged to learn features with high inter-class dispersion and intra-class compactness. Through comprehensive experiments on three widely used zero-shot learning benchmarks, we show the efficacy of the multi-attention localization and our proposed approach improves the state-of-the-art results by a considerable margin.

name change, semantic-guided multi-attention localization, zero-shot learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.78)

Add feedback

Reviews: Semantic-Guided Multi-Attention Localization for Zero-Shot Learning

Neural Information Processing SystemsJan-21-2025, 23:46:48 GMT

The problem is relevant and the method is based on an interesting attention based idea to look at different regions in the image for the task of ZSL The losses used focus on (i) making each attention map peaky, while making different maps diverse, (ii) embedding based softmax for better prediction and (iii) class center triplet loss which makes the features closer to their respective class centers relative to the other class centers. Line 190 mentions that the image and parts are sent to "separate backbone networks", which implies that the network parameters are not shared. If that is the case then the method will have 3x parameters cf competing methods ie. a significantly higher capacity network overall. What happens when the CNN params are shared? And what happens when the image only baseline has a higher capacity network backbone (which is also then end-to-end finetuned)?

semantic-guided multi-attention localization, supplementary section, zero-shot learning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.54)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.43)

Add feedback

Reviews: Semantic-Guided Multi-Attention Localization for Zero-Shot Learning

Neural Information Processing SystemsJan-21-2025, 23:46:37 GMT

The contribution is interesting and proposes novel ideas for zero-shot learning based on attention mechanisms and combining different local and global features. The method achieves good results, the rebuttal helps to strengthen the contribution.

contribution, semantic-guided multi-attention localization, zero-shot learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback

Semantic-Guided Multi-Attention Localization for Zero-Shot Learning

Neural Information Processing SystemsOct-9-2024, 14:51:11 GMT

Zero-shot learning extends the conventional object classification to the unseen class recognition by introducing semantic representations of classes. Existing approaches predominantly focus on learning the proper mapping function for visual-semantic embedding, while neglecting the effect of learning discriminative visual features. In this paper, we study the significance of the discriminative region localization. We propose a semantic-guided multi-attention localization model, which automatically discovers the most discriminative parts of objects for zero-shot learning without any human annotations. Our model jointly learns cooperative global and local features from the whole object as well as the detected parts to categorize objects based on semantic descriptions.

semantic-guided multi-attention localization, zero-shot learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)

Add feedback

Semantic-Guided Multi-Attention Localization for Zero-Shot Learning

Zhu, Yizhe, Xie, Jianwen, Tang, Zhiqiang, Peng, Xi, Elgammal, Ahmed

Neural Information Processing SystemsMar-19-2020, 02:47:36 GMT

Zero-shot learning extends the conventional object classification to the unseen class recognition by introducing semantic representations of classes. Existing approaches predominantly focus on learning the proper mapping function for visual-semantic embedding, while neglecting the effect of learning discriminative visual features. In this paper, we study the significance of the discriminative region localization. We propose a semantic-guided multi-attention localization model, which automatically discovers the most discriminative parts of objects for zero-shot learning without any human annotations. Our model jointly learns cooperative global and local features from the whole object as well as the detected parts to categorize objects based on semantic descriptions.

semantic-guided multi-attention localization, zero-shot learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)

Add feedback

Filters

Collaborating Authors

semantic-guided multi-attention localization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Semantic-Guided Multi-Attention Localization for Zero-Shot Learning

Reviews: Semantic-Guided Multi-Attention Localization for Zero-Shot Learning

Reviews: Semantic-Guided Multi-Attention Localization for Zero-Shot Learning

Semantic-Guided Multi-Attention Localization for Zero-Shot Learning

Semantic-Guided Multi-Attention Localization for Zero-Shot Learning